Workshop on Data Science in Biomedicine
نویسندگان
چکیده
The results of genome-wide association studies (GWAS) indicate that most complex diseases are highly polygenic: (1) associated alleles typically have small effect sizes, (2) the number of significant associated alleles has continued to increase with increasing sample size, and (3) genome-wide methods such as random effects models have detected much greater total genetic effects than accounted for by genome-wide significant SNPs. The genome-wide significant loci so far detected for complex diseases therefore likely represent the tip of the iceberg of the total polygenic components of these diseases. New statistical methods, informed by genome organization and function, will be necessary to characterize the polygenic components of complex diseases. Examples of such methods will be presented. 10:10 – 10: 40 Yingying Wei, Statistics, The Chinese University of Hong Kong Title: A Scalable Integrative Model for Heterogeneous Genomic Data Types under Multiple Conditions Abstract: A key problem in biology is how the same copy of a genome within a person can give rise to hundreds of cell types. Plentiful convincing evidence indicates multiple elements, such as transcription factor binding, histone modification, and DNA methylation, all contribute to the regulation of gene expression levels in different cell types. Therefore, it is crucial to understand how these heterogeneous regulatory elements collaborate together, how the cooperation at a given genomic region changes across diverse cell lines, as well as how such dynamic cooperation patterns across cell lines vary along the whole genome. Here, we propose a scalable hierarchical probabilistic generative model to cluster genomic regions according to the dynamic changes of their open chromatin and DNA methylation status across cell types. The model will overcome the exponential growth of parameter space as the number of cell types integrated increases. The fitted results of the model will provide a genome-wide region-specific, cell-lineA key problem in biology is how the same copy of a genome within a person can give rise to hundreds of cell types. Plentiful convincing evidence indicates multiple elements, such as transcription factor binding, histone modification, and DNA methylation, all contribute to the regulation of gene expression levels in different cell types. Therefore, it is crucial to understand how these heterogeneous regulatory elements collaborate together, how the cooperation at a given genomic region changes across diverse cell lines, as well as how such dynamic cooperation patterns across cell lines vary along the whole genome. Here, we propose a scalable hierarchical probabilistic generative model to cluster genomic regions according to the dynamic changes of their open chromatin and DNA methylation status across cell types. The model will overcome the exponential growth of parameter space as the number of cell types integrated increases. The fitted results of the model will provide a genome-wide region-specific, cell-linespecific open chromatin and DNA methylation landscape map. This is a joint work with Mai Shi. 10:40 – 11: 00 Break 11:00 – 11: 30 Can Yang Title: IPAC: A Flexible Statistical Approach to Integrating Pleitoropy and Annotation for Characterizing Functional Roles of Genetic Variants that Underlie Human Complex Phenotypes Abstract: Recent international projects, such as the Encyclopedia of DNA Elements (ENCODE) project, the Roadmap project and the Genotype-Tissue Expression (GTEx) project, have generated vast amounts of genomic annotation data measured, e.g., epigenome and transcriptome. On the other hand, increasing evidence suggests that seemly unrelated phenotypes can share common genetic factors, which is known as pleiotropy. A big challenge in integrative analysis is how to put pleiotropy and annotation into a unified model and automatically select most relevant genomic features from a potentially huge set of genomic features. In this talk, we introduce a flexible statistical approach, named IPAC, to integrating pleiotropy and annotation for characterizing functional roles of genetic variants that underlie human complex phenotypes. IPAC enabled us to automatically perform feature selection from a large number of annotated genomic features and naturally incorporate the selected features for prioritization of genetic risk variants. IPAC not only demonstrated a remarkably computational efficiency (e.g., it took about 2~3 minutes to handle millions of genetic variants and thousands of functional annotations), but also allowed rigorous statistical inference of the model parameters and false discovery rate control in risk variant prioritization. With the IPAC approach, we performed integrative analysis of genome-wide association studies on multiple complex human traits and genome-wide annotation resources, e.g., Roadmap epigenome. The analysis results revealed interesting regulatory patterns of risk variants. These findings undoubtedly deepen our understanding of genetic architectures of complex traits. This is a joint work with Dongjun Chung, Cong Li, Jin Liu, Xiang Wan, Qian Wang, Chao Yang, and Hongyu Zhao. Recent international projects, such as the Encyclopedia of DNA Elements (ENCODE) project, the Roadmap project and the Genotype-Tissue Expression (GTEx) project, have generated vast amounts of genomic annotation data measured, e.g., epigenome and transcriptome. On the other hand, increasing evidence suggests that seemly unrelated phenotypes can share common genetic factors, which is known as pleiotropy. A big challenge in integrative analysis is how to put pleiotropy and annotation into a unified model and automatically select most relevant genomic features from a potentially huge set of genomic features. In this talk, we introduce a flexible statistical approach, named IPAC, to integrating pleiotropy and annotation for characterizing functional roles of genetic variants that underlie human complex phenotypes. IPAC enabled us to automatically perform feature selection from a large number of annotated genomic features and naturally incorporate the selected features for prioritization of genetic risk variants. IPAC not only demonstrated a remarkably computational efficiency (e.g., it took about 2~3 minutes to handle millions of genetic variants and thousands of functional annotations), but also allowed rigorous statistical inference of the model parameters and false discovery rate control in risk variant prioritization. With the IPAC approach, we performed integrative analysis of genome-wide association studies on multiple complex human traits and genome-wide annotation resources, e.g., Roadmap epigenome. The analysis results revealed interesting regulatory patterns of risk variants. These findings undoubtedly deepen our understanding of genetic architectures of complex traits. This is a joint work with Dongjun Chung, Cong Li, Jin Liu, Xiang Wan, Qian Wang, Chao Yang, and Hongyu Zhao. 11:30 – 12: 00 Qiongshi Lu, Biostatistics, Yale University Title: Post-GWAS prioritization through integrated analysis of genomic functional annotation Abstract: Genome-wide association study (GWAS) has been a great success in the past decade, with tens of thousands of loci identified associated with many complex diseases in humans. However, significant challenges still remain in both identifying new risk loci and interpreting results. Bonferroni-corrected significance level is known to be conservative, leading to insufficient statistical power when the effect size is small to moderate at risk locus. Complex structure of linkage disequilibrium also makes it challenging to separate causal variants from nonfunctional ones in large haplotype blocks. We describe GenoWAP (Genome Wide Association Prioritizer), a post-GWAS prioritization method that integrates genomic functional annotation and GWAS test statistics. The effectiveness of GenoWAP is demonstrated through its applications to GWAS results for Crohn’s disease and schizophrenia using the largest studies available. After prioritization based on a subset of all the available samples, highly ranked loci show substantially stronger signals in the whole dataset than the top loci before prioritization. At the single nucleotide polymorphism (SNP) level, top ranked SNPs after prioritization have both higher replication rates and consistently stronger enrichment of eQTLs. Within each risk locus, GenoWAP is able to distinguish real signal sources from groups of correlated SNPs. The GenoWAP software is available at http://genocanyon.med.yale.edu/GenoWAP Genome-wide association study (GWAS) has been a great success in the past decade, with tens of thousands of loci identified associated with many complex diseases in humans. However, significant challenges still remain in both identifying new risk loci and interpreting results. Bonferroni-corrected significance level is known to be conservative, leading to insufficient statistical power when the effect size is small to moderate at risk locus. Complex structure of linkage disequilibrium also makes it challenging to separate causal variants from nonfunctional ones in large haplotype blocks. We describe GenoWAP (Genome Wide Association Prioritizer), a post-GWAS prioritization method that integrates genomic functional annotation and GWAS test statistics. The effectiveness of GenoWAP is demonstrated through its applications to GWAS results for Crohn’s disease and schizophrenia using the largest studies available. After prioritization based on a subset of all the available samples, highly ranked loci show substantially stronger signals in the whole dataset than the top loci before prioritization. At the single nucleotide polymorphism (SNP) level, top ranked SNPs after prioritization have both higher replication rates and consistently stronger enrichment of eQTLs. Within each risk locus, GenoWAP is able to distinguish real signal sources from groups of correlated SNPs. The GenoWAP software is available at http://genocanyon.med.yale.edu/GenoWAP 12: 00 – 14:00 Lunch break 14:00 – 14:30 Shuangge Ma, Biostatistics, Yale University Title: Promoting Similarity of Sparsity Structures in Integrative Analysis Abstract: For data with high-dimensional covariates but small to moderate sample sizes, the analysis of single datasets often generates unsatisfactory results. The integrative analysis of multiple independent datasets provides an effective way of pooling information and outperforms single-dataset analysis and some alternative multi-datasets approaches including meta-analysis. Under certain scenarios, multiple datasets are expected to share common important covariates, that is, their models have similarity in sparsity structures. However, the existing methods do not have a mechanism to promote the similarity of sparsity structures in integrative analysis. In this study, we consider penalized variable selection and estimation in integrative analysis. We develop a penalization based approach, which is the first to explicitly promote the similarity of sparsity structures. Computationally it is realized using a coordinate descent algorithm. Theoretically it has the much desired consistency properties. In simulation, it significantly outperforms the competing alternative when the models in multiple datasets share common important For data with high-dimensional covariates but small to moderate sample sizes, the analysis of single datasets often generates unsatisfactory results. The integrative analysis of multiple independent datasets provides an effective way of pooling information and outperforms single-dataset analysis and some alternative multi-datasets approaches including meta-analysis. Under certain scenarios, multiple datasets are expected to share common important covariates, that is, their models have similarity in sparsity structures. However, the existing methods do not have a mechanism to promote the similarity of sparsity structures in integrative analysis. In this study, we consider penalized variable selection and estimation in integrative analysis. We develop a penalization based approach, which is the first to explicitly promote the similarity of sparsity structures. Computationally it is realized using a coordinate descent algorithm. Theoretically it has the much desired consistency properties. In simulation, it significantly outperforms the competing alternative when the models in multiple datasets share common important covariates. It has better or similar performance as the alternative when there is no shared important covariate. Thus it provides a “safe” choice for data analysis. Applying the proposed method to three lung cancer datasets with gene expression measurements leads to models with significantly more similar sparsity structures and better prediction performance. 14: 30 – 15:00 Bin Nan, Statistics, University of Michigan Title: Large covariance/correlation matrix estimation for temporal data Abstract: We consider the estimation of high-dimensional covariance and We consider the estimation of high-dimensional covariance and correlation matrices under slow-decaying temporal dependence. For generalized thresholding estimators, convergence rates are obtained and properties of sparsistency and sign-consistency are established. The impact of temporal dependence on convergence rates is also investigated. An intuitive cross-validation method is proposed for the thresholding parameter selection, which shows good performance in simulations. Convergence rates are also obtained for banding method if the covariance or correlation matrix is bandable. The considered temporal dependence has longer memory than those in the current literature and has particular implications in analyzing resting-state fMRI data for brain connectivity studies. This is a joint work with Hai Shu.
منابع مشابه
Evaluation of a Workshop on Social Determinants of Health based on Kirkpatrrick Model
Introduction: Evaluation provides truly useful information about the effectiveness of the educational programs. Kirkpatrick model provides a four-level training evaluation framework (reaction, learning, behavior, and results). The present research aimed to investigate the effectiveness of a workshop on “social determinants of health” held for the health staff of a medical center in Bandar Abbas...
متن کاملWorkshop on Sample Preparation Techniques for Life Science-Applications at Beamline L
About ten different European user groups work on biomedical research projects at the SRXRF/XANES-microprobe beamline L. For this community a dedicated one-day workshop about life science-applications at this beamline was held at HASYLAB on 17.03.2006. The workshop placed emphasis on different aspects of sample preparation techniques, but also general topics like imaging techniques in life scien...
متن کاملO-7: Detrimental Effects of Dietary Fish Oil without Vitamin E Supplementation on Cryopreserved Sperm of Iranian Mehraban Rams
Background Although several studies confirmed positive effects of fish oil on semen quality, the antioxidant status in omega-3 supplemented diets of several studies is a fuzzy point. The aim of this study was to investigate the effect of dietary fish oil and (or) vitamin E supplementation on cryopreserved sperm in Mehraban rams. MaterialsAndMethods Sixteen fertile rams were randomly allotted to...
متن کاملComparison of Maximum Likelihood Estimation and Bayesian with Generalized Gibbs Sampling for Ordinal Regression Analysis of Ovarian Hyperstimulation Syndrome
Background and Objectives: Analysis of ordinal data outcomes could lead to bias estimates and large variance in sparse one. The objective of this study is to compare parameter estimates of an ordinal regression model under maximum likelihood and Bayesian framework with generalized Gibbs sampling. The models were used to analyze ovarian hyperstimulation syndrome data. Methods: This study use...
متن کاملAn empirical investigation into the relationship between workshop operations and accidents in local automobile garages in Ghana
Local automobile garage workers carry out daily workshop operations, which sometimes lead to accidents and injuries. Therefore, this study was carried out to establish a relationship between automobile workshop operations causing accidents and safety practices among local garage workers in Ghana. Three main data collection approaches were used in the study namely focus group discussions (10 FGD...
متن کاملInm-2: Assessment of Relaxation Effect on Anxiety in Infertile Women Undergoing Assisted Reproduction Techniques during Ovulation Induction
Background Infertility and diagnostic and therapeutic procedures of which during assisted reproduction technologies (ART) make discomfort and anxiety for infertile couples. Several studies have demonstrated that infertility treatment procedures, particularly during ovulation induction period, lead to high levels of distress and anxiety in infertile women. The aim of this study was to evaluate t...
متن کامل